In this seminar, we're going to play with Tensorflow and see how it helps you build deep learning models.
If you're running this notebook outside the course environment, you'll need to install Tensorflow v1:
pip install tensorflow==1.15.2
should install CPU-only TF on Linux & Mac OSpip install tensorflow-gpu==1.15.2
might or might not work.
In [ ]:
import sys, os
if 'google.colab' in sys.modules:
%tensorflow_version 1.x
if not os.path.exists('.setup_complete'):
!wget -q https://raw.githubusercontent.com/yandexdataschool/Practical_RL/spring20/setup_colab.sh -O- | bash
!wget -q https://raw.githubusercontent.com/yandexdataschool/Practical_RL/spring20/week04_[recap]_deep_learning/mnist.py
!touch .setup_complete
# This code creates a virtual display to draw game images on.
# It will have no effect if your machine has a monitor.
if type(os.environ.get("DISPLAY")) is not str or len(os.environ.get("DISPLAY")) == 0:
!bash ../xvfb start
os.environ['DISPLAY'] = ':1'
In [ ]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
In [ ]:
import tensorflow as tf
# session is main tensorflow object. You ask session to compute stuff for you.
sess = tf.InteractiveSession()
In [ ]:
def sum_squares(N):
return <YOUR CODE: student.implement_me()>
In [ ]:
%%time
sum_squares(10**8)
Same with tensorflow
In [ ]:
# "i will insert N here later"
N = tf.placeholder('int64', name="input_to_your_function")
# a recipe on how to produce {sum of squares of arange of N} given N
result = tf.reduce_sum((tf.range(N)**2))
In [ ]:
%%time
# dear session, compute the result please. Here's your N.
print(sess.run(result, {N: 10**8}))
# hint: run it several times to let tensorflow "warm up"
sess.run(outputs, {placeholder1:value1, placeholder2:value2})
Still confused? We gonna fix that.
Placeholders and constants
In [ ]:
# placeholder that can be arbitrary float32 scalar, vertor, matrix, etc.
arbitrary_input = tf.placeholder('float32')
# input vector of arbitrary length
input_vector = tf.placeholder('float32', shape=(None,))
# input vector that _must_ have 10 elements and integer type
fixed_vector = tf.placeholder('int32', shape=(10,))
# you can generally use None whenever you don't need a specific shape
input1 = tf.placeholder('float64', shape=(None, 100, None))
input2 = tf.placeholder('int32', shape=(None, None, 3, 224, 224))
You can create new tensors with arbitrary operations on placeholders, constants and other tensors.
a + b, a / b, a ** b, ...
behave just like in numpyThere are tons of other stuff in tensorflow, see the docs or learn as you go with shift+tab.
In [ ]:
# elementwise multiplication
double_the_vector = input_vector * 2
# elementwise cosine
elementwise_cosine = tf.cos(input_vector)
# elementwise difference between squared vector and it's means - with some random salt
vector_squares = input_vector ** 2 - \
tf.reduce_mean(input_vector) + tf.random_normal(tf.shape(input_vector))
inspired by this post
There are some simple mathematical functions with cool plots. For one, consider this:
$$ x(t) = t - 1.5 * cos( 15 t) $$$$ y(t) = t - 1.5 * sin( 16 t) $$
In [ ]:
t = tf.placeholder('float32')
# compute x(t) and y(t) as defined above.
x = <YOUR CODE>
y = <YOUR CODE>
x_points, y_points = sess.run([x, y], {t: np.linspace(-10, 10, num=10000)})
plt.plot(x_points, y_points)
It's often useful to visualize the computation graph when debugging or optimizing. Interactive visualization is where tensorflow really shines as compared to other frameworks.
There's a special instrument for that, called Tensorboard. You can launch it from console:
tensorboard --logdir=/tmp/tboard --port=7007
If you're pathologically afraid of consoles, try this:
import os; os.system("tensorboard --logdir=/tmp/tboard --port=7007 &")
(but don't tell anyone we taught you that)
One basic functionality of tensorboard is drawing graphs. One you've run the cell above, go to localhost:7007
in your browser and switch to graphs tab in the topbar.
Here's what you should see:
Tensorboard also allows you to draw graphs (e.g. learning curves), record images & audio and play flash games. This is useful when monitoring learning progress and catching some training issues.
One researcher said:
If you spent last four hours of your worktime watching as your algorithm prints numbers and draws figures, you're probably doing deep learning wrong.
You can read more on tensorboard usage here
In [ ]:
# Quest #1 - implement a function that computes a mean squared error of two input vectors
# Your function has to take 2 vectors and return a single number
<YOUR CODE: student.define_inputs_and_transformations()>
mse = <YOUR CODE: student.define_transformation()>
compute_mse = lambda vector1, vector2: sess.run( <YOUR CODE: how to run your graph?> , {})
In [ ]:
# Tests
from sklearn.metrics import mean_squared_error
for n in [1, 5, 10, 10 ** 3]:
elems = [np.arange(n), np.arange(n, 0, -1), np.zeros(n),
np.ones(n), np.random.random(n), np.random.randint(100, size=n)]
for el in elems:
for el_2 in elems:
true_mse = np.array(mean_squared_error(el, el_2))
my_mse = compute_mse(el, el_2)
if not np.allclose(true_mse, my_mse):
print('Wrong result:')
print('mse(%s,%s)' % (el, el_2))
print("should be: %f, but your function returned %f" %
(true_mse, my_mse))
raise ValueError, "Что-то не так"
print("All tests passed")
The inputs and transformations have no value outside function call. That's a bit unnatural if you want your model to have parameters (e.g. network weights) that are always present, but can change their value over time.
Tensorflow solves this with tf.Variable
objects.
sess.run(...)
-ing
In [ ]:
# creating shared variable
shared_vector_1 = tf.Variable(initial_value=np.ones(5))
# initialize all variables with initial values
sess.run(tf.global_variables_initializer())
In [ ]:
# evaluating shared variable (outside symbolicd graph)
print("initial value", sess.run(shared_vector_1))
# within symbolic graph you use them just as any other inout or transformation, not "get value" needed
In [ ]:
# setting new value manually
sess.run(shared_vector_1.assign(np.arange(5)))
# getting that new value
print("new value", sess.run(shared_vector_1))
It can get you the derivative of any graph as long as it knows how to differentiate elementary operations
In [ ]:
my_scalar = tf.placeholder('float32')
scalar_squared = my_scalar ** 2
# a derivative of scalar_squared by my_scalar
derivative = tf.gradients(scalar_squared, [my_scalar])[0]
In [ ]:
x = np.linspace(-3, 3)
x_squared, x_squared_der = sess.run(
[scalar_squared, derivative], {my_scalar: x})
plt.plot(x, x_squared, label="x^2")
plt.plot(x, x_squared_der, label="derivative")
plt.legend()
In [ ]:
my_vector = tf.placeholder('float32', [None])
# Compute the gradient of the next weird function over my_scalar and my_vector
# warning! Trying to understand the meaning of that function may result in permanent brain damage
weird_psychotic_function = tf.reduce_mean((my_vector+my_scalar)**(1+tf.nn.moments(my_vector, [0])[1]) + 1. / tf.atan(my_scalar))/(my_scalar**2 + 1) + 0.01*tf.sin(
2*my_scalar**1.5)*(tf.reduce_sum(my_vector) * my_scalar**2)*tf.exp((my_scalar-4)**2)/(1+tf.exp((my_scalar-4)**2))*(1.-(tf.exp(-(my_scalar-4)**2))/(1+tf.exp(-(my_scalar-4)**2)))**2
der_by_scalar = <YOUR CODE: student.compute_grad_over_scalar()>
der_by_vector = <YOUR CODE: student.compute_grad_over_vector()>
In [ ]:
# Plotting your derivative
scalar_space = np.linspace(1, 7, 100)
y = [
sess.run(weird_psychotic_function, {my_scalar: x, my_vector: [1, 2, 3]})
for x in scalar_space]
plt.plot(scalar_space, y, label='function')
y_der_by_scalar = [
sess.run(der_by_scalar, {my_scalar: x, my_vector: [1, 2, 3]})
for x in scalar_space]
plt.plot(scalar_space, y_der_by_scalar, label='derivative')
plt.grid()
plt.legend()
In [ ]:
y_guess = tf.Variable(np.zeros(2, dtype='float32'))
y_true = tf.range(1, 3, dtype='float32')
loss = tf.reduce_mean((y_guess - y_true + tf.random_normal([2]))**2)
optimizer = tf.train.MomentumOptimizer(
0.01, 0.9).minimize(loss, var_list=y_guess)
# same, but more detailed:
# updates = [[tf.gradients(loss,y_guess)[0], y_guess]]
# optimizer = tf.train.MomentumOptimizer(0.01,0.9).apply_gradients(updates)
In [ ]:
from IPython.display import clear_output
sess.run(tf.global_variables_initializer())
guesses = [sess.run(y_guess)]
for _ in range(100):
sess.run(optimizer)
guesses.append(sess.run(y_guess))
clear_output(True)
plt.plot(*zip(*guesses), marker='.')
plt.scatter(*sess.run(y_true), c='red')
plt.show()
Implement the regular logistic regression training algorithm
We shall train on a two-class MNIST dataset.
This is a binary classification problem, so we'll train a Logistic Regression with sigmoid. $$P(y_i | X_i) = \sigma(W \cdot X_i + b) ={ 1 \over {1+e^{- [W \cdot X_i + b]}} }$$
The natural choice of loss function is to use binary crossentropy (aka logloss, negative llh): $$ L = {1 \over N} \underset{X_i,y_i} \sum - [ y_i \cdot log P(y_i | X_i) + (1-y_i) \cdot log (1-P(y_i | X_i)) ]$$
Mind the minus :)
In [ ]:
from sklearn.datasets import load_digits
X, y = load_digits(2, return_X_y=True)
print("y [shape - %s]:" % (str(y.shape)), y[:10])
print("X [shape - %s]:" % (str(X.shape)))
In [ ]:
print('X:\n', X[:3, :10])
print('y:\n', y[:10])
plt.imshow(X[0].reshape([8, 8]))
In [ ]:
# inputs and shareds
weights = <YOUR CODE: student.create_variable()>
input_X = <YOUR CODE: student.create_placeholder_matrix()>
input_y = <YOUR CODE: student.code_placeholder_vector()>
In [ ]:
predicted_y_proba = <YOUR CODE: predicted probabilities for input_X using weights>
loss = <YOUR CODE: logistic loss(scalar, mean over sample) between predicted_y_proba and input_y>
train_step = <YOUR CODE: operator that minimizes loss>
In [ ]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
In [ ]:
from sklearn.metrics import roc_auc_score
for i in range(5):
loss_i, _ = sess.run([loss, train_step], <YOUR CODE: feed values to placeholders>)
print("loss at iter %i: %.4f" % (i, loss_i))
print("train auc:", roc_auc_score(
y_train, sess.run(predicted_y_proba, {input_X: X_train})))
print("test auc:", roc_auc_score(
y_test, sess.run(predicted_y_proba, {input_X: X_test})))
print("resulting weights:")
plt.imshow(weights.get_value().reshape(8, -1))
plt.colorbar();
Your ultimate task for this week is to build your first neural network [almost] from scratch and pure tensorflow.
This time you will same digit recognition problem, but at a larger scale
Note that you are not required to build 152-layer monsters here. A 2-layer (one hidden, one output) NN should already have ive you an edge over logistic regression.
[bonus score] If you've already beaten logistic regression with a two-layer net, but enthusiasm still ain't gone, you can try improving the test accuracy even further! The milestones would be 95%/97.5%/98.5% accuraсy on test set.
SPOILER! At the end of the notebook you will find a few tips and frequently made mistakes. If you feel enough might to shoot yourself in the foot without external assistance, we encourage you to do so, but if you encounter any unsurpassable issues, please do look there before mailing us.
In [ ]:
from mnist import load_dataset
# [down]loading the original MNIST dataset.
# Please note that you should only train your NN on _train sample,
# _val can be used to evaluate out-of-sample error, compare models or perform early-stopping
# _test should be hidden under a rock untill final evaluation... But we both know it is near impossible to catch you evaluating on it.
X_train, y_train, X_val, y_val, X_test, y_test = load_dataset()
print(X_train.shape, y_train.shape)
In [ ]:
plt.imshow(X_train[0, 0])
In [ ]:
<this cell looks as if it wants you to create variables here>
In [ ]:
<you could just as well create a computation graph here - loss, optimizers, all that stuff>
In [ ]:
<this may or may not be a good place to run optimizer in a loop>
In [ ]:
<this may be a perfect cell to write a training & evaluation loop in>
In [ ]:
<predict & evaluate on test here, right? No cheating pls.>
Recommended pipeline
Add a hidden layer. Now your logistic regression uses hidden neurons instead of inputs.
Now's the time to try improving the network. Consider layers (size, neuron count), nonlinearities, optimization methods, initialization - whatever you want, but please avoid convolutions for now.